{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# File,Character Encoding and Table Data File\n",
    "\n",
    "\n",
    "## 1 Files\n",
    "\n",
    "\n",
    "Every computer system uses files to save things from one computation to the next.\n",
    "\n",
    "Python provides many facilities for creating and accessing files.\n",
    "\n",
    "The built-in function <b style=\"color:blue\">open</b> \n",
    "\n",
    "```python\n",
    "open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)\n",
    "``` \n",
    "\n",
    "Open file and return a stream(a file object)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "help(open)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The <b style=\"color:blue\">open()</b> is most commonly used with <b style=\"color:blue\">two arguments</b>:\n",
    "\n",
    "<b style=\"color:blue\">open(filename, mode)</b>.\n",
    "\n",
    "```python\n",
    "f = open('workfile', 'w')\n",
    "```    \n",
    "\n",
    "* The first argument is a `string` containing the filename. \n",
    "\n",
    "\n",
    "* The second argument is another `string` containing a few characters describing the way in which the file will be used. \n",
    " \n",
    "   * **Writing** : <b style=\"color:blue\">'w'</b> for only writing (an existing file with the same name will be erased)\n",
    "\n",
    "   * **Reading** :<b style=\"color:blue\">'r'</b> when the file will only be read(default) \n",
    "\n",
    "   * **Appending** :<b style=\"color:blue\">'a'</b>  open for writing, appending to the end of the file if it exists\n",
    "\n",
    "\n",
    "Normally, files are opened in <b style=\"color:blue\">text</b>  mode, that means, you read and write `strings` from and to the file.\n",
    "\n",
    "Here we illustrate some of the basic ones in the <b style=\"color:blue\">text</b> mode:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1  Writing \n",
    "\n",
    "\n",
    "<b style=\"color:blue\">open</b>  create a file with the name **kids.txt** ,\n",
    "\n",
    "using the argument <b style=\"color:blue\">w</b> to indicate that the file is to be opened for **writing**.\n",
    "\n",
    " * an existing file with the same name will be erased\n",
    " \n",
    "using its <b style=\"color:blue\">write</b> methods appropriately to write to the file.\n",
    "\n",
    "In the end, we finally <b style=\"color:blue\">close</b> the file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "names=['David','Andrea']\n",
    "nameHandle = open('kids.txt', 'w')\n",
    "\n",
    "for name in names:\n",
    "    nameHandle.write(name + '\\n') # the string '\\n' indicates a new line character.\n",
    "\n",
    "nameHandle.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**name + '\\n'**: the string **'\\n'** indicates a **new line** character."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!dir kids.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# %load kids.txt\n",
    "David\n",
    "Andrea\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### 1.2 Reading\n",
    "\n",
    "<b style=\"color:blue\">open</b> the file for **reading**,using the argument <b style=\"color:blue\">'r'</b>\n",
    "\n",
    "`'r'` will be assumed if it’s omitted.\n",
    "\n",
    "For reading lines from a file, you can `loop` over `the file object`. This is memory efficient, fast, and leads to simple code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<_io.TextIOWrapper name='kids.txt' mode='r' encoding='cp936'>\n",
      "David\n",
      "\n",
      "Andrea\n",
      "\n"
     ]
    }
   ],
   "source": [
    "nameHandle = open('kids.txt', 'r')\n",
    "\n",
    "#nameHandle = open('kids.txt') #'r' will be assumed if it’s omitted.\n",
    "\n",
    "for line in nameHandle:\n",
    "    print(line)\n",
    "    \n",
    "nameHandle.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "You see **new line** between each name. \n",
    "\n",
    "```python\n",
    "David\n",
    "\n",
    "Andrea\n",
    "```\n",
    "Because\n",
    "\n",
    "* `\\n` at the end of each line -> the new  line\n",
    "\n",
    "* `print(line)` procude the new line\n",
    "\n",
    "```python\n",
    "print(*objects, sep=' ', end='\\n', file=sys.stdout, flush=False)\n",
    "```\n",
    "the default value `end='\\n'`, so that `print(line)` procude the new line\n",
    "\n",
    "We could have avoided printing that(**new line**) by writing:\n",
    "```python\n",
    "  print(line[:-1])\n",
    "```\n",
    "**slicing** line to delete **'\\n'** in each line for file. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nameHandle = open('kids.txt', 'r')\n",
    "for line in nameHandle:\n",
    "    print(line[:-1])  # print(line[:len(line)-1] \\n\n",
    "nameHandle.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### 1.3 Appending\n",
    "\n",
    "<b style=\"color:blue\">open</b> the file for **appending** (instead of writing) by using the argument  <b style=\"color:blue\">a</b>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "names=['Zhang\\n','Li\\n']\n",
    "\n",
    "# append\n",
    "nameHandle = open('kids.txt', 'a') # argument 'a' -  appending\n",
    "for name in names:\n",
    "    nameHandle.write(name)\n",
    "nameHandle.close()\n",
    "\n",
    "# read\n",
    "nameHandle = open('kids.txt', 'r')\n",
    "for line in nameHandle:\n",
    "    print(line[:-1])\n",
    "nameHandle.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 The common operations on files\n",
    "\n",
    "Some of the common operations on files are summarized\n",
    "\n",
    "![Figure412](./img/figure412.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using <b style=\"color:blue\">readline</b> methods \n",
    "\n",
    "Using `print(line, end='')` ,so that print(line) do not get the new line"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nameHandle = open('kids.txt', 'r')\n",
    "\n",
    "while True:\n",
    "    line = nameHandle.readline()\n",
    "    # Zero length indicates EOF\n",
    "    if len(line) == 0:\n",
    "        break\n",
    "   \n",
    "    # The `line` already has a newline at the end of each line\n",
    "    print(line, end='')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Character Encoding and Plain text file in a specific `encoding`\n",
    "\n",
    "`Text encoding` is a sufficiently complex topic that there’s no one size fits all answer - the right answer for a given application will depend on factors like:\n",
    "\n",
    "* how much control you have over the text encodings used\n",
    "\n",
    "* whether avoiding program failure is more important than avoiding data corruption or vice-versa\n",
    "\n",
    "* how common encoding errors are expected to be, and whether they need to be handled gracefully or can simply be rejected as invalid input\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1  Character Encoding\n",
    "\n",
    "\n",
    "In computer memory, **character** are \"encoded\" (or \"represented\") using a chosen **\"character encoding schemes\"** (aka \"character set\", \"charset\", \"character map\", or \"code page\").\n",
    "\n",
    "**ASCII** \n",
    "\n",
    "For many years most programming languages used a standard called **ASCII** for the internal representation of characters. \n",
    "\n",
    "This standard included 128 characters, plenty for representing the usual set of characters appearing in `English-language` text—but `not enough to cover the characters and accents appearing in all the world’s languages.`\n",
    "\n",
    "For example, in **ASCII** (American Standard Code for Information Interchange),as well as Latin1, Unicode, \n",
    "\n",
    "* code numbers 65D (41H) to 90D (5AH) represents 'A' to 'Z', respectively.\n",
    "\n",
    "* code numbers 97D (61H) to 122D (7AH) represents 'a' to 'z', respectively.\n",
    "\n",
    "* code numbers 48D (30H) to 57D (39H) represents '0' to '9', respectively.\n",
    "\n",
    "It is important to note that the **representation scheme must be known** before a binary pattern can be interpreted. E.g., the 8-bit pattern \"0100 0010B\" could represent anything.\n",
    "\n",
    "**Unicode**.\n",
    "\n",
    "* **Unicode:** ISO/IEC 10646 Universal Character Set\n",
    "\n",
    "In recent years, there has been a shift to **Unicode**. \n",
    "\n",
    "The Unicode standard is a character coding system designed to support the digital processing and display of the written `texts of all languages`. The standard contains more than 120,000 different characters—covering 129 modern and historic scripts and multiple symbol sets.\n",
    "\n",
    "\n",
    " \n",
    "Unicode encoding scheme could represent characters in **all languages**.\n",
    "\n",
    "> **Reference** [Tutorial on Data Representation Integers, Floating-point Numbers, and Characters]( http://www3.ntu.edu.sg/home/ehchua/programming/java/DataRepresentation.html)\n",
    "\n",
    "---\n",
    "\n",
    "**The default encoding Python3：UTF-8**\n",
    "\n",
    "The **default encoding** for `Python source code`  and `Jupyter Notebook` is **UTF-8**,\n",
    " \n",
    "Since **Python 3.0**, the language features a **str** type that contain **Unicode** characters\n",
    "\n",
    "> https://docs.python.org/howto/unicode.html\n",
    "\n",
    "so you can simply include a Unicode character in a string literal:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Visual Studio Code default encoding: utf8**\n",
    "\n",
    "![](./img/vsc-encoding.jpg)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can `Reopen with Encoding` ,or `save the Encoding ` to transfer the text to a suitable character coding system.\n",
    "\n",
    "![vscode-encoding](./img/vscode-encoding.jpg)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will give more examples on Character Encoding in the following section "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**中文Windows默认字符集编码是中文GBK**\n",
    "\n",
    "* **GBK**: 汉字内码扩展规范(Chinese Internal Code Specification)\n",
    "\n",
    "[Problem_Solution](https://github.com/PySEE/home/blob/S2020/guide/doc/Problem_Solution.md)\n",
    "\n",
    "* [中文Windows10环境下，在VS Codo的PowerShell终端启动Jupyter Notebook后，用户界面为中文](https://github.com/PySEE/home/blob/S2020/guide/doc/Problem_Solution.md#%E4%B8%AD%E6%96%87windows10%E7%8E%AF%E5%A2%83%E4%B8%8B%E5%9C%A8vs-codo%E7%9A%84powershell%E7%BB%88%E7%AB%AF%E5%90%AF%E5%8A%A8jupyter-notebook%E5%90%8E%E7%94%A8%E6%88%B7%E7%95%8C%E9%9D%A2%E4%B8%BA%E4%B8%AD%E6%96%87)\n",
    "\n",
    "* [Windows下，MinGW-W64编译UTF-8编码的C++源码，生成的运行文件向终端输出中文乱码](https://github.com/PySEE/home/blob/S2020/guide/doc/Problem_Solution.md#windows%E4%B8%8Bmingw-w64%E7%BC%96%E8%AF%91utf-8%E7%BC%96%E7%A0%81%E7%9A%84c%E6%BA%90%E7%A0%81%E7%94%9F%E6%88%90%E7%9A%84%E8%BF%90%E8%A1%8C%E6%96%87%E4%BB%B6%E5%90%91%E7%BB%88%E7%AB%AF%E8%BE%93%E5%87%BA%E4%B8%AD%E6%96%87%E4%B9%B1%E7%A0%81)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Plain text file in a specific `encoding`\n",
    "\n",
    "We have been reading and writing strings from and to the file, which are encoded in `the default encoding, Unicode UTF-8` for Python3\n",
    "\n",
    "Both English and non-English characters can be represented in UTF-8"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fname=\"./code/python/default.txt\"\n",
    "f = open(fname,'w')\n",
    "f.write('中文default')\n",
    "f.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fname=\"./code/python/default.txt\"\n",
    "f = open(fname,'r')\n",
    "line=f.readline()\n",
    "print(line)\n",
    "f.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### The specific encoding\n",
    "\n",
    "We can read and write in a specific encoding  by using a keyword argument <b style=\"color:blue\">encoding</b> in the `open` function.\n",
    "\n",
    "Example: Writing  in GBK\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fname=\"./code/python/gbk.txt\"\n",
    "f = open(fname,'w',encoding=\"gbk\")\n",
    "f.write('中文-gbk')\n",
    "f.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Open GBK with GBK encoding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f = open(fname,'r',encoding=\"gbk\")\n",
    "line=f.readline()\n",
    "print(line)\n",
    "f.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Open GBK with UTF-8 encoding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f = open(fname,'r',encoding=\"utf-8\")\n",
    "line=f.readline()\n",
    "print(line)\n",
    "f.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Data Table (File), Dictionary and List\n",
    "\n",
    "|name     |  age   |\n",
    "|---------:|------:|\n",
    "|zhangsan  |  28    |\n",
    "|Lishi  |  18    |  \n",
    "\n",
    "\n",
    "**In the concept of Relation Database**\n",
    "\n",
    "```\n",
    "data table  -> Relation Database's Table(表）关系数据库的数据表\n",
    "    column -> field（域）: name\tage\t\n",
    "     row  -> record（记录）: zhangsan   28    \n",
    "```\n",
    "\n",
    "### 3.1 data table: `dict`\n",
    "\n",
    "**view data table in `Columns`** \n",
    "\n",
    "* **data table is a `dict`**, \n",
    "\n",
    "* **values of each field is `list`**\n",
    "``\n",
    "\n",
    "```python\n",
    "data table  -> dict: {key:value}\n",
    "    column -> key(string)\n",
    "    all row of each column -> value(list)\n",
    "```\n",
    "\n",
    "data table is **dict**，values of each field is `list`\n",
    "\n",
    "```python\n",
    "dict {field1:[],field2:[]....}\n",
    "```\n",
    "\n",
    "```python\n",
    "{'name': ['zhangsan', 'lishi'],\n",
    " 'age': [28.0, 18.0]\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Creating dict from the `file` of  data table**\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting ./data/personrecords.txt\n"
     ]
    }
   ],
   "source": [
    "%%file ./data/personrecords.txt\n",
    "name        age\n",
    "zhangsan    28\n",
    "lishi       18 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "{\"name\":[],\"age\"：[]}\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['name', 'age']\n"
     ]
    }
   ],
   "source": [
    "datatable={}\n",
    "datefile=open('./data/personrecords.txt')\n",
    "# 1 get string of field(column)\n",
    "fields=datefile.readline().split()\n",
    "print(fields)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dict for datatable:\n",
      "{field1:[],field2:[]....}\n",
      "{'name': [], 'age': []}\n"
     ]
    }
   ],
   "source": [
    "# 2 create the dict of  data table\n",
    "# {\"name\":[],\"age\"：[]}\n",
    "for key in fields:\n",
    "    datatable[key] = []\n",
    "    \n",
    "#  datatable={ key:[] for key in fields: }\n",
    "print(\"dict for datatable:\\n{field1:[],field2:[]....}\")\n",
    "print(datatable)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'name': ['zhangsan', 'lishi'], 'age': [28.0, 18.0]}\n"
     ]
    }
   ],
   "source": [
    "# 3 read each row into the value list of key \n",
    "for row in datefile:\n",
    "    currow=row.split()\n",
    "    for i in range(len(fields)):\n",
    "        if fields[i]==\"age\":\n",
    "            datatable[fields[i]].append(float(currow[i]))\n",
    "        else:\n",
    "            datatable[fields[i]].append(currow[i])\n",
    "\n",
    "datefile.close()\n",
    "\n",
    "print(datatable)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**average age**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "23.0\n"
     ]
    }
   ],
   "source": [
    "from statistics import mean\n",
    "avgage=mean(datatable[\"age\"])\n",
    "print(avgage)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 data table `list` \n",
    "\n",
    "**view data table in `Rows`** \n",
    "\n",
    "* data table is a `list` of rows\n",
    "\n",
    "* each `row` is a `dict`\n",
    "\n",
    "```python\n",
    "list[dict]: [{field1:value,field2:value,*:*},{field1:value,field2:value,*.*},...]\n",
    "```\n",
    "\n",
    "\n",
    "|name     |  age   |\n",
    "|---------:|------:|\n",
    "|zhangsan  |  28    |\n",
    "|Lishi  |  18    |  \n",
    "\n",
    "\n",
    "\n",
    "```python\n",
    "[{\"name\":zhangsan,\"age\":28},\n",
    " {\"name\":lishi,\"age\":18}\n",
    "]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['name', 'age']\n"
     ]
    }
   ],
   "source": [
    "datatable=[] # data table is a list\n",
    "datafile=open('./data/personrecords.txt')\n",
    "# 1 get string of field(column)\n",
    "fields=datafile.readline().split()\n",
    "print(fields)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'name': 'zhangsan', 'age': 28.0}, {'name': 'lishi', 'age': 18.0}]\n"
     ]
    }
   ],
   "source": [
    "# 2 read each row into dict：key is field string\n",
    "for row in datafile:\n",
    "    currow=row.split()\n",
    "    rowdict={} # 2.1 init dict\n",
    "    for i in range(len(fields)):\n",
    "        # 2.2 add key:value to dict\n",
    "        if fields[i]==\"age\":\n",
    "            rowdict[fields[i]]=float(currow[i])\n",
    "        else:\n",
    "            rowdict[fields[i]]=currow[i]\n",
    "    # 2.3 add dict to list:records\n",
    "    datatable.append(rowdict)\n",
    "\n",
    "datafile.close()\n",
    "print(datatable)      "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'name': 'zhangsan', 'age': 28.0}\n",
      "{'name': 'lishi', 'age': 18.0}\n",
      "zhangsan\n",
      "lishi\n"
     ]
    }
   ],
   "source": [
    "for item in datatable:\n",
    "    print(item)\n",
    "    \n",
    "for item in datatable:\n",
    "    print(item['name']) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**average age**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "23.0\n"
     ]
    }
   ],
   "source": [
    "from statistics import mean\n",
    "ages=[item[\"age\"] for item in datatable]\n",
    "avgage=mean(ages)\n",
    "print(avgage)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What is programs?\n",
    "\n",
    "<b style=\"color:blue;font-size:110%\">Algorithms + Data Structures = Programs</b> is a 1976 book written by `Niklaus Wirth` covering some of the fundamental topics of computer programming, particularly that algorithms and data structures are inherently related. (https://en.wikipedia.org/wiki/Algorithms_%2B_Data_Structures_%3D_Programs)**\n",
    "\n",
    "The Turbo Pascal compiler written by **Anders Hejlsberg** was largely inspired by the \"Tiny Pascal\" compiler in **Niklaus Wirth**'s book.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 CSV  \n",
    "\n",
    "#### 3.3.1 CSV: Comma-separated values\n",
    "\n",
    "https://en.wikipedia.org/wiki/Comma-separated_values\n",
    "    \n",
    "In computing, a comma-separated values (**CSV**) file stores **tabular** data (numbers and text) in **plain text**.\n",
    "\n",
    "* Each **line** of the file is a data **record**\n",
    ". \n",
    "* Each **record** consists of one or more **fields**, separated by <b style=\"color:red\">commas</b>.    \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%file ./data/personrecords.csv\n",
    "name,age\n",
    "zhangsan,28\n",
    "lishi,18 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "CSV is **a common data exchange format** that is widely supported by consumer, business, and scientific applications. \n",
    "\n",
    "For example, a user may need to transfer information from a **database** program that stores data in a proprietary format, to a **spreadsheet** that uses a completely different format. \n",
    "\n",
    "The database program most likely can export its data as \"CSV\"; the exported CSV file can then be imported by the spreadsheet program\n",
    "\n",
    "The `CSV` format is the most common import and export format for `spreadsheets and databases`.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.3.2 CSV module\n",
    "\n",
    "The `csv` module implements classes to read and write tabular data in CSV format.\n",
    "\n",
    "https://docs.python.org/3.7/library/csv.html\n",
    "\n",
    "##### csv.DictReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting ./data/personrecords.csv\n"
     ]
    }
   ],
   "source": [
    "%%file ./data/personrecords.csv\n",
    "name,age\n",
    "zhangsan,28\n",
    "lishi,18 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "fields = ['name', 'age']\n",
      "{'name': 'zhangsan', 'age': '28'}\n",
      "zhangsan 28.0\n",
      "{'name': 'lishi', 'age': '18 '}\n",
      "lishi 18.0\n"
     ]
    }
   ],
   "source": [
    "import  csv\n",
    "filename=\"./data/personrecords.csv\"\n",
    "csvfile = open(filename, 'r')\n",
    "csvdata = csv.DictReader(csvfile)\n",
    "# fields\n",
    "print(\"fields =\",csvdata.fieldnames)\n",
    "# values\n",
    "for line in csvdata:\n",
    "    print(line)\n",
    "    name = line['name']\n",
    "    age=float(line['age'])\n",
    "    print(name,age) \n",
    "csvfile.close()    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### csv.writerow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "fields=['name','age','city']\n",
    "rows=[['Zhangsan',28,'Nanjing'],\n",
    "      ['Lishi',18,'Beijing']\n",
    "     ]\n",
    "csvfile=open(\"./data/personrecords.csv\",\"w\", newline='') \n",
    "csvwriter = csv.writer(csvfile,dialect=(\"excel\"))\n",
    "csvwriter.writerow(fields)\n",
    "for record in rows:\n",
    "    csvwriter.writerow(record)\n",
    "csvfile.close()            "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load \"./data/personrecords.csv\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Further Reading"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Binary  Files\n",
    "\n",
    "`'b'` appended to the mode opens the file in `binary mode`: now the data is read and written in the form of `bytes` objects. \n",
    "\n",
    "This mode should be used for all files that don’t contain text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f = open('binaryfile', 'wb')\n",
    "f.write(b'0123456789abcdef')\n",
    "f.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To change the file object’s position, use `f.seek(offset, from_what)`. The position is computed from adding offset to a reference point; the reference point is selected by the `from_what` argument.\n",
    "\n",
    "A `from_what` value of \n",
    "\n",
    "* `0` measures from the beginning of the file, \n",
    "\n",
    "* `1` uses the current file position, and \n",
    "\n",
    "* `2` uses the end of the file as the reference point. \n",
    "\n",
    "`from_what` can be omitted and defaults to `0`, using the beginning of the file as the reference point."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f = open('binaryfile', 'rb')\n",
    "f.seek(5) # Go to the 6th byte in the file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f.read(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f.seek(-3, 2) # Go to the 3rd byte before the end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "f.read(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "**Reference**\n",
    "\n",
    "\n",
    "* **J. M. Hughes. Real World Instrumentation with Python: CHAPTER 12 Reading and Writing Data Files**\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* [Python HOWTOs: Unicode HOWTO](https://docs.python.org/howto/unicode.html)\n",
    " \n",
    "\n",
    "* [Python Tutorial 7.2. Reading and Writing Files](https://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files)\n",
    " \n",
    " \n",
    "* [Processing Text Files in Python 3](http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html)\n",
    "  \n",
    "\n",
    "* [Pandas](http://pandas.pydata.org/)  is an open source, BSD-licensed library providing high-performance, easy-to-use **data structures and data analysis tools** for the Python programming language.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.7"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": false,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "207.646px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}